# Zero-shot transfer

Cultureclip
Vision-language model fine-tuned based on CLIP-ViT-B/32, suitable for image-text matching tasks
Text-to-Image Transformers
C
lukahh
20
0
Ipa Whisper Base
Apache-2.0
A multilingual speech recognition model fine-tuned based on Whisper-base, supporting International Phonetic Alphabet (IPA) output
Speech Recognition Safetensors Supports Multiple Languages
I
neurlang
599
6
Snowflake Arctic Embed M V2.0 Cpu
Apache-2.0
Snowflake Arctic Embed M v2.0 is a multilingual sentence embedding model focused on sentence similarity tasks, supporting over 50 languages.
Text Embedding Transformers Supports Multiple Languages
S
cnmoro
502
3
Aimv2 3b Patch14 336.apple Pt
AIM-v2 is an image encoder model based on the timm library, suitable for image feature extraction tasks.
Image Classification Transformers
A
timm
35
0
Vesselfm
Other
VesselFM is a foundation model for universal 3D vascular segmentation in any imaging domain.
Image Segmentation
V
bwittmann
153
4
Zcabnzh Bp
Bsd-3-clause
BLIP is a unified vision-language pretraining framework, excelling in tasks like image caption generation and visual question answering, with performance enhanced by innovative data filtering mechanisms
Image-to-Text Transformers
Z
nanxiz
19
0
Zoedepth Nyu Kitti
MIT
ZoeDepth is a depth estimation model fine-tuned on NYU and KITTI datasets, capable of estimating depth values in actual metric units.
3D Vision Transformers
Z
Intel
20.32k
5
Zoedepth Nyu
MIT
ZoeDepth is a model for monocular depth estimation, specifically fine-tuned on the NYU dataset, capable of zero-shot transfer and metric depth estimation.
3D Vision Transformers
Z
Intel
1,279
1
Meditron 7b Llm Radiology
Apache-2.0
This is an open-source model under the Apache-2.0 license. Specific information needs to be supplemented.
Large Language Model Transformers
M
nitinaggarwal12
26
1
NLLB Az
MIT
This is a model released under the MIT license, with specific details currently unknown.
Large Language Model Transformers
N
omar07ibrahim
35
9
Dpt Swinv2 Large 384
MIT
DPT model based on SwinV2 backbone network for monocular depth estimation, trained on 1.4 million images
3D Vision Transformers
D
Intel
84
0
Dpt Swinv2 Tiny 256
MIT
DPT model based on SwinV2 backbone network for monocular depth estimation, trained on 1.4 million images.
3D Vision Transformers
D
Intel
2,285
9
Dpt Beit Large 384
MIT
Monocular depth estimation model based on BEiT backbone network, capable of inferring detailed depth information from a single image
3D Vision Transformers
D
Intel
135
0
Donut Web
Apache-2.0
Large Language Model Transformers
D
laverdes
14
0
Dpt Hybrid Midas
Apache-2.0
A monocular depth estimation model based on Vision Transformer (ViT), trained on 1.4 million images
3D Vision Transformers
D
Intel
224.05k
94
Scinertopic
MIT
A scientific term recognition model based on SciBERT, supporting NER-enhanced topic modeling
Sequence Labeling Transformers
S
RJuro
71
7
Bde Cner Batteryonlybert Uncased Base
MIT
This model is released under the MIT license, with specific details currently unknown.
Large Language Model Transformers
B
batterydata
1,128
2
Infoxlm Large
InfoXLM is a cross-lingual pre-training framework based on information theory, designed to enhance cross-lingual representation learning by maximizing mutual information between different languages.
Large Language Model Transformers
I
microsoft
1.1M
12
Gigabert V4 Arabic And English
GigaBERT-v4 is a model further pretrained on code-mixed data based on GigaBERT-v3, demonstrating improved zero-shot transfer performance from English to Arabic in information extraction (IE) tasks.
Large Language Model
G
lanwuwei
24
5
Gigabert V3 Arabic And English
GigaBERT-v3 is a bilingual BERT model customized for English and Arabic, pre-trained on a large-scale corpus, and excels in information extraction tasks.
Large Language Model Supports Multiple Languages
G
lanwuwei
38
3
Mdeberta V3 Base
MIT
mDeBERTa is the multilingual version of DeBERTa, employing ELECTRA-style pretraining and gradient-disentangled embedding sharing technology, demonstrating excellent performance in cross-lingual tasks like XNLI
Large Language Model Transformers Supports Multiple Languages
M
microsoft
692.08k
179
Bart Large Xsum
MIT
A large-scale summarization model based on the BART architecture, fine-tuned specifically on the xsum dataset, excelling at generating concise news summaries.
Text Generation English
B
facebook
20.44k
35
Infoxlm Base
InfoXLM is a cross-lingual pre-training framework based on information theory, designed to enhance model performance by maximizing mutual information in cross-lingual tasks.
Large Language Model Transformers
I
microsoft
20.30k
7
Fact Or Opinion Xlmr El
Apache-2.0
This is a binary classification model based on the XLM-Roberta-base architecture, capable of classifying sentences as facts or opinions. It supports English and Greek, and features zero-shot learning capability.
Text Classification Transformers Supports Multiple Languages
F
lighteternal
1,051
22
Xlm Roberta Base Finetuned Shona
Apache-2.0
Open-source model based on Apache-2.0 license (specific information unavailable)
Large Language Model Transformers
X
Davlan
13
3
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase